MDPs

Over the next several videos, you'll learn all about how to rigorously define a reinforcement learning problem as a Markov Decision Process (MDP).

Towards this goal, we'll begin with an example!

MDPs, Part 1

## Notes

In general, the state space \mathcal{S} is the set of all nonterminal states.

In continuing tasks (like the recycling task detailed in the video), this is equivalent to the set of all states.

In episodic tasks, we use \mathcal{S}^+ to refer to the set of all states, including terminal states.

The action space \mathcal{A} is the set of possible actions available to the agent.

In the event that there are some states where only a subset of the actions are available, we can also use \mathcal{A}(s) to refer to the set of actions available in state s\in\mathcal{S}.